66 research outputs found

    Design and Implementation of MPICH2 over InfiniBand with RDMA Support

    Full text link
    For several years, MPI has been the de facto standard for writing parallel applications. One of the most popular MPI implementations is MPICH. Its successor, MPICH2, features a completely new design that provides more performance and flexibility. To ensure portability, it has a hierarchical structure based on which porting can be done at different levels. In this paper, we present our experiences designing and implementing MPICH2 over InfiniBand. Because of its high performance and open standard, InfiniBand is gaining popularity in the area of high-performance computing. Our study focuses on optimizing the performance of MPI-1 functions in MPICH2. One of our objectives is to exploit Remote Direct Memory Access (RDMA) in Infiniband to achieve high performance. We have based our design on the RDMA Channel interface provided by MPICH2, which encapsulates architecture-dependent communication functionalities into a very small set of functions. Starting with a basic design, we apply different optimizations and also propose a zero-copy-based design. We characterize the impact of our optimizations and designs using microbenchmarks. We have also performed an application-level evaluation using the NAS Parallel Benchmarks. Our optimized MPICH2 implementation achieves 7.6 μ\mus latency and 857 MB/s bandwidth, which are close to the raw performance of the underlying InfiniBand layer. Our study shows that the RDMA Channel interface in MPICH2 provides a simple, yet powerful, abstraction that enables implementations with high performance by exploiting RDMA operations in InfiniBand. To the best of our knowledge, this is the first high-performance design and implementation of MPICH2 on InfiniBand using RDMA support.Comment: 12 pages, 17 figure

    Origin and tuning of the magnetocaloric effect for the magnetic refrigerant MnFe(P1-xGex)

    Full text link
    Neutron diffraction and magnetization measurements of the magneto refrigerant Mn1+yFe1-yP1-xGex reveal that the ferromagnetic and paramagnetic phases correspond to two very distinct crystal structures, with the magnetic entropy change as a function of magnetic field or temperature being directly controlled by the phase fraction of this first-order transition. By tuning the physical properties of this system we have achieved a maximum magnetic entropy change exceeding 74 J/Kg K for both increasing and decreasing field, more than twice the value of the previous record.Comment: 6 Figures. One tabl

    Marine hydrographic spatial-variability and its cause at the northern margin of the Amery Ice Shelf

    Get PDF
    Conductivity, temperature and depth(CTD) data collected along a zonal hydrographic section from the northern margin of the Amery Ice Shelf on 25–27 February 2008 by the 24th Chinese National Antarctic Research Expedition (CHINARE) cruise in the 2007/2008 austral summer are analyzed to study thermohaline structures. Analysis reveals warm subsurface water in a limited area around the east end of the northern margin, where the temperature, salinity and density have east-west gradients in the surface layer of the hydrographic section. The localization of the warm subsurface water and the causes of the CTD gradients in the surface layer are discussed. In addition, the results from these CTD data analyses are compared with those from the 22nd CHINARE cruise in the 2005/2006 austral summer. This comparison revealed that the thermoclines and haloclines had deepened and their strengths weakened in the 2007/2008 austral summer. The difference between the two data sets and the cause for it can be reasonably explained and attributed to the change in ocean-ice-atmosphere interactions at the northern margin of the Amery Ice Shelf

    Building multirail infiniband clusters: Mpi-level design and performance evaluation

    No full text
    InfiniBand is becoming increasingly popular in the area of cluster computing due to its open standard and high performance. However, even with InfiniBand, network bandwidth can become the performance bottleneck for some of today’s most demanding applications. In this paper, we study the problem of overcoming the bandwidth bottleneck by using multirail networks. We present different ways (multiple HCAs, multiple ports and virtual multirail configuration) of setting up multirail networks with InfiniBand and propose a unified MPI design that can support these approaches. We discuss various important design issues (out of order message handling, handling multiple HCAs) and provide an in-depth discussion of different policies for using multirail networks. We also propose an adaptive striping scheme that can dynamically change the striping parameters based on current system conditions. We implement our design and evaluate it with microbenchmarks and applications. Our performance results show that multirail networks can significantly improve MPI communication performance. With a two rail InfiniBand cluster, we can achieve almost twice the bandwidth and half the latency for large messages compared to the original MPI implementation. The multirail MPI implementation can significantly reduce the communication time as well as the total execution time depending on the communication pattern at the application level. We also show that the adaptive striping scheme can achieve excellent performance without apriori knowledge of individual rail bandwidth

    High Performance RDMA-Based MPI Implementation over

    No full text
    Although InfiniBand Architecture is relatively new in the high performance computing area, it o#ers many features which help us to improve the performance of communication subsystems. One of these features is Remote Direct Memory Access (RDMA) operations. In this paper, we propose a new design of MPI over InfiniBand which brings the benefit of RDMA to not only large messages, but also small and control messages. We also achieve better scalability by exploiting application communication pattern and combining send/receive operations with RDMA operations. Our RDMA-based MPI implementation currently delivers a latency of 6.8 microseconds for small messages and a peak bandwidth of 871 Million Bytes (831 Mega Bytes) per second. Performance evaluation at the MPI level shows that for small messages, our RDMA-based design can reduce the latency by 24%, increase the bandwidth by over 104%, and reduce the host overhead by up to 22%. For large messages, we improve performance by reducing the time for transferring control messages. We have also shown that our new design is beneficial to MPI collective communication and NAS Parallel Benchmarks
    • …
    corecore